Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extract_log] Improve extract_log script #559

Merged
merged 3 commits into from
Apr 3, 2018

Conversation

yxieca
Copy link
Collaborator

@yxieca yxieca commented Apr 2, 2018

Signed-off-by: Ying Xie ying.xie@microsoft.com

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

How did you do it?

  • Some log entries has '\x00' sub-strings in front of
    time stamp, these sub strings are messing up with the
    comparator, adding a pre-handling to remove them.
  • Replace sort O(n x log(n)) with a look up (O(N)).

How did you verify/test it?
Run fast-reboot test on my dut.

- Some log entries has '\x00' sub-strings in front of
  time stamp, these sub strings are messing up with the
  comparator, adding a pre-handling to remove them.
- Replace sort O(n x log(n)) with a look up (O(n)).

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
target = target_lines[0] if len(target_lines) > 0 else None
for line in target_lines:
if comparator(line, target) > 0:
target = line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference from

sorted(target_lines, cmp=comparator) 

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort is sorting the whole array. The best of sorting algorithm is O(N * Log(N)).

We only need the latest entry, which can be obtained by the loop in O(N).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a huge optimization by itself. I had to open the sort call to debug. And since we could get what we need faster, I figured why not :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case you can use max() function from python which is O(N) and just one-liner

Copy link
Collaborator Author

@yxieca yxieca Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in person. max doesn't take cmp=... parameter and the (key= lamda) method needs to extend string class.

for line in file:
# This might be a gunzip file issue, there has been '\x00's in front of the
# log entry timestamp which messes up with the comparator.
# Prehandle lines to remove these sub-strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these zeros from gzip file?
I suspect rotate logs procedure rotate syslog file while we're reading it.
I'd better to stop log rotator while we reading from syslog files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it came from logrotate, but the thing is that I found these entries in old log files, not recent ones. That means if problem happened, it will stay there. Stopping log rotating while reading log won't fix the old log files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to rm all *.gz before test. That doesn't guarantee newly rotated gz files never hit these issue during test though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't delete .gz files. Our default syslog file size is small. If the reboot was normal start of the reboot is in syslog.1
If we got any issues, you can find start of the reboot in syslog.2.gz or earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to disable log rotate while we're running the script. Otherwise we could miss some information.
Also, I don't understand which component save zeros to syslog file and why. Generally I don't like to fix something, while I don't figure out source of the issue. Like in this case.
Also you fix could be only this long
result = [(filename, line) for line in file if target_string in line.replace('\x00', '') and 'nsible' not in line]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a much better format. Thanks Pavel!

@@ -90,9 +90,14 @@ def extract_line(directory, filename, target_string):
file = gzip.GzipFile(path)
else:
file = open(path)
result = None
result = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not required. The empty list will be overwritten.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, Either way is fine. The append function in caller takes both None and [] and the result is the same (in case no entry were found).

@yxieca yxieca self-assigned this Apr 3, 2018
@yxieca yxieca merged commit 0bc9e05 into sonic-net:master Apr 3, 2018
@yxieca yxieca deleted the extract_log branch April 3, 2018 22:54
praveen-li pushed a commit to praveen-li/sonic-mgmt that referenced this pull request Jun 20, 2019
* msft_github/master: (111 commits)
  add disconnect/connect vm to testbed-cli.sh (sonic-net#566)
  [dhcp_relay] Increase sleep duration to allow LAG and BGP to come up (sonic-net#565)
  [pfcwd]: support t0-116 and clean up the code (sonic-net#563)
  [fanout]: remove vrf management in arista fanout deploy templates (sonic-net#562)
  [fanout-switch-deploy] Support multiple speeds and port breakout (sonic-net#561)
  [extract_log] Improve extract_log script (sonic-net#559)
  Disable upgrade_sonic retry (sonic-net#560)
  Support for Sonic fanout (sonic-net#555)
  [Fanout deploy template] enable root user on fanout switches (sonic-net#557)
  [link state] match exact dut name in the link list (sonic-net#556)
  [pfcwd]: cache the ansible facts (sonic-net#554)
  Fix kernel version check (sonic-net#553)
  [pfcwd]: increase the pause waiting time and ingore snmp errors (sonic-net#551)
  fix typo (sonic-net#552)
  [dir_bcast] enable dir_bcast test on t0-116 topology (sonic-net#550)
  use command to gather host distribution, kernel version facts (sonic-net#549)
  [minigraph templte] use consistent VLAN subnet (sonic-net#546)
  [VM config] skip podset 0 tor 0 routing entry (sonic-net#545)
  Remove job minigraph_facts from boot_onie (sonic-net#548)
  [fast-reboot] pass VM IP in as ASCII strings (sonic-net#547)
  [fast-reboot test] fix syslog reading issue (sonic-net#543)
  add dataacl to minigraph template (sonic-net#544)
  [service_acl] Make test reliable when testing Arista service ACL solution (sonic-net#542)
  [pfcwd]: Iterate functional test over all ports (sonic-net#490)
  [service_acl] Detect expected output message even if it is followed by other text (sonic-net#541)
  [testbed]: Remove connection local for port_alias module (sonic-net#540)
  Revert "[minigraph-gen] fix AclInterface entries in minigraph (sonic-net#538)" (sonic-net#539)
  [minigraph-gen] fix AclInterface entries in minigraph (sonic-net#538)
  add retries in onie installation (sonic-net#537)
  Use connection plugin to install sonic image in ONIE. (sonic-net#536)
  [dhcp_relay]: Add --relax flag to ptf command (sonic-net#535)
  [minigraph_facts] use minigraph on DUT (sonic-net#534)
  Fix minigraph_facts: mkdir recursively (sonic-net#533)
  [minigraph_facts] use mingraph on DUT to test (sonic-net#532)
  generate minigraph based on topology file (sonic-net#531)
  Need to double-escape when using 'args' syntax (sonic-net#529)
  Fix improper 'local_action' syntax (sonic-net#528)
  Unify style of 'wait_for' actions across playbooks (sonic-net#527)
  [minigraph_facts] retrieving dhcp server list from vlan configuration instead of DhcpResources (sonic-net#526)
  [snmp_facts] increase get command timeout to fix cpu test failure (sonic-net#525)
  [everflow_test]: Add copy ptftests folder to use the remote.py file (sonic-net#522)
  Fix snmp_facts on PSU oid (sonic-net#520)
  Fix snmp queue test (sonic-net#519)
  [lag_test]: Remove the unnecessary testbed_type check (sonic-net#518)
  [ip_decap_test]: Support t0-64 topology (sonic-net#517)
  [acl_test]: Copy ptftests folder for the remote.py file (sonic-net#516)
  Add test case for PSU (sonic-net#514)
  [mtu]: Add t1-64-lag topology support for MTU test (sonic-net#513)
  [ip_decap]: Add t1-64-lag support in the script for the list of source port (sonic-net#512)
  [everflow]: Add missing spaces in ptf command (sonic-net#511)
  [acl_test]: Add ptf_platform_dir: ptftests to use customized platform code to support 64 ports (sonic-net#510)
  [crm]: Implement test for CRM (sonic-net#473)
  [everflow]: Add support for t1-64-lag topology (sonic-net#502)
  Fix sonic_image_version: get from sonic_version.yml, no dependency on grub (sonic-net#508)
  [topology]: Update t1-64-lag topology template to add AclInterfaces piece (sonic-net#505)
  Pull syncd-rpc with sonic version tag (sonic-net#507)
  Remove leading and trailing whitespaces when reading veos file (sonic-net#506)
  [lag_2] remove hard coded interval_count so it can be set by test (sonic-net#503)
  ptf_runner: Add one line comment for ptf_platform_dir (sonic-net#501)
  [testbed]: add port speed and fec configuration in sonic fanout (sonic-net#498)
  [typo]: Replace string t1-lag-64 with t1-64-lag (sonic-net#499)
  Adding sensor data for S6100 (sonic-net#496)
  [fib_test]: Add t1-64-lag src_ports in FIB test (sonic-net#497)
  Fix typo in acl test case name (sonic-net#494)
  Add one more Mellanox SKU string in everflow_tb_test script (sonic-net#495)
  Adding sensor data for Z9100 (sonic-net#492)
  [service_acl] Make test more robust and efficient (sonic-net#489)
  [dhcp relay test] adding more test scenarios (sonic-net#440)
  fix sanity check failed to recover (sonic-net#488)
  Fix PFC_WD test (sonic-net#479)
  add sonic fanout support (sonic-net#485)
  Update README.test.md
  Update README.test.md
  Fix table caption in testbed.csv and documentation (sonic-net#482)
  [test case] Add test: restart swss service (sonic-net#483)
  Add test case port toggle (sonic-net#484)
  [test infrastructure] allow overriding recover system actioin (sonic-net#480)
  [SNMP]add new SNMP counters tests to snmp.yml (sonic-net#477)
  add t0-52 topology (sonic-net#476)
  add command line option for creategraph.py (sonic-net#475)
  Add support for additional timestamp format (sonic-net#474)
  Ignore ansible output in extract_log (sonic-net#472)
  Add extract_logs action to concatenate logs after log rotate (sonic-net#471)
  Add ACL ICMP test (sonic-net#2) (sonic-net#465)
  [pfcwd]:add docker exec to avoid tty error (sonic-net#470)
  [test_tag]remove pfc_wd test tag from test by tag main yaml (sonic-net#469)
  [ansible]gather fact by default (sonic-net#468)
  [sensors]fix Dell S6000 sensors test fail (sonic-net#462)
  [vlan]improve test to wait a bit longer for config reload (sonic-net#463)
  one more place for hwsku of new Mellanox 2700 (sonic-net#464)
  [sensors]sensors test add new hwsku for Mellanox 2700 (sonic-net#461)
  [PFCWD]: test enhancement (sonic-net#456)
  [sensors] remove redundant sensor data definitin for sku MSN2100 (sonic-net#460)
  when call testcase by name, fetch all vms management info from testbed_facts (sonic-net#457)
  [acl test] error in ACL rule json file for destination ip (sonic-net#434)
  [sensor data] add sensor data for Mellanox MSN2410 and MSN2100 (sonic-net#449)
  Fix sku-sensors-data for 7050-QX32 (sonic-net#454)
  [deploy minigraph]add default enable BGP to deploy minigraph step (sonic-net#455)
  [upgrade]save bgp UP state after they are brough up (sonic-net#453)
  [reboot test] call sudo reboot to reboot dut (sonic-net#452)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants